Chapter 8
Getting Your Data into the Computer
IN THIS CHAPTER
Understanding levels of measurement (nominal, ordinal, interval, and ratio)
Defining and entering different kinds of data into your research database
Making sure your data are accurate
Creating a data dictionary to describe the data in your database
Before you can analyze data, you have to collect it and get it into the computer in a form that’s suitable
for analysis. Chapter 5 describes this process as a series of steps — figuring out what data you need
and how they are structured, creating data entry forms and computer files to hold your data, and
entering and validating your data.
In this chapter, we describe a crucially important component of that process, which is storing the data
properly in your research database. Different kinds of data can be represented in the computer in
different ways. At the most basic level, there are numerical values and classifications, and most of us
can immediately tell the two apart — you don’t have to be a math genius to recognize “age” as
numerical data, and “occupation” as categorical information.
So why are we devoting a whole chapter to describing, entering, and checking different types of data?
It turns out that the topic of data storage is not quite as trivial as it may seem at first. You need to be
aware of some important details or you may wind up collecting your data the wrong way and finding
out too late that you can’t run the appropriate analysis. This chapter starts by explaining the different
levels of measurement, and shows you how to define and store different types of data. It also suggests
ways to check your data for errors, and explains how to formally describe your database so that others
are able to work with it if you’re not available.
Looking at Levels of Measurement
Around the middle of the 20th century, the idea of levels of measurement caught the attention of
biological and social-science researchers and, in particular, psychologists. One classification scheme,
which has become widely used (at least in statistics textbooks), recognizes four levels at which
variables can be measured: nominal, ordinal, interval, and ratio:
Nominal variables are expressed as mutually exclusive categories, like country of origin (United
States, China, India, and so on), type of care provider (nurse, physician, social worker, and so on),
and type of bacteria (such as coccus, bacillus, rickettsia, mycoplasma, or spirillum). Nominal
indicates that the sequence in which you list the different categories is purely arbitrary. For
example, listing type of care provider as nurse, physician, and social worker is no more or less
natural than listing them as social worker, nurse, and physician.